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Executive 

Summary 


Because Congress passed the Every Student Succeeds 
Act (ESSA) last December, states are revamping their 
federally required systems to measure school quality 
and hold schools accountable for performance. But 
most are doing so using outdated assumptions, 
holdovers from the Industrial Era, when cookie- 
cutter public schools followed orders from central 
headquarters and students were assigned to the 
closest school. Today we are migrating toward 
systems of diverse, fairly autonomous schools of 
choice, some of them operated by independent 
organizations. Before revising their measurement and 
accountability systems, states need to rethink their 
assumptions. 


For instance, most states have assumed that they should apply one standardized, 
statewide accountability system to almost all public schools. Most have also 
assumed that measurement and accountability systems are roughly the same 
thing, so the only aspects of performance they need to measure are those in their 
federally-required accountability systems. Under the old No Child Left Behind (NCLB) 
Act, most of those measures were standardized test scores, and what counted was 
the percentage of students scoring "proficient" or better. When schools repeatedly 
failed to meet such standards, most states assumed the proper response was some 
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minor form of restructuring required by NCLB— perhaps a new principal, perhaps 
some new teachers, perhaps some new money. 


None of these assumptions will produce the schools our children need in the 21 st 
century. NCLB was an important step in its time, institutionalizing an expectation 
that states would hold schools accountable for the learning of all their children, 
including the poor, minorities, and those 
with special needs. But it relied on the 
fairly blunt tools used by most states back 
in 2001 : primarily achievement scores 
on standardized math and reading tests. 

In the intervening 1 5 years, more tools 
have become available— and even more 
are the subject of intense research today. 

Fortunately, the ESSA has opened the door 
to these new approaches. 


We need diverse schools 
that cultivate the joy of 
learning, engage their 
students in deep learning, 
and help them develop the 
"character skills"-such as 
conscientiousness and self- 
control— that lead to success 
in life. 


If we want 21 st century schools, we must hold them accountable to 21 st century 
standards. For too long, we have defined and measured school quality in a way 
that encourages cookie-cutter schools, all focused on preparing students for tests. 
Instead, we need diverse schools that cultivate the joy of learning, engage their 
students in deep learning, and help them develop the "character skills"— such as 
conscientiousness and self-control— that lead to success in life. 


To achieve this will require a series of fundamental changes: 

We need accountability systems that focus on more than minimal standards and 

treat different schools differently. It still makes sense for states to adopt minimum 
standards that will trigger consequences for most schools if they consistently 
fail to meet them. After all, we want every child to learn to read, reason, do basic 
math, write coherently, and gain some familiarity with science, technology, history, 
geography, and civics. If children are not learning these things, should we really be 
using taxpayers' dollars to fund their schools? 

We must always remember, though, that these are minimum standards. Beyond 
them, states should encourage districts and authorizers to negotiate more specific 
performance goals and measures that reflect the missions of individual schools. If 
a school is designed to provide STEM (science, technology, engineering, and math) 
education, for instance, it should be judged on how well it does so. If it is designed to 
provide dual language immersion, or career and technical education, its performance 
measures should reflect those goals. 

Such performance agreements can motivate every school, whereas minimum state 
standards have little effect on schools whose students regularly score above the 
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minimums. Performance agreements also encourage people to open innovative 
schools designed to serve different kinds of students. When we apply standardized 
accountability to all schools, we do just the opposite— and we get far less innovation. 

We need systems that make accountability real and powerful by replacing failed 
schools with proven models that fit students' needs. NCLB allowed states and 
districts to impose conseguences for failure that had few teeth, and the ESSA could 
well make the problem worse. But if states want to improve student learning, real 
accountability is one of their most powerful tools. Experience has proven that the 
most effective way to turn around a failing school is to replace it— to bring in an 
entirely new team with a strong track record and a new vision for the school. This 
helps motivate staff at every school: when everyone in a school knows that their 
jobs will disappear if students are not learning enough, they usually work together to 
make it happen, no matter the obstacles. 

In national charter school studies, the states and cities with high performing 
charters are those that actively monitor quality and close or replace schools whose 
students are falling behind. Places that failed to do this (until recently), such as 
Arizona, Texas, and Ohio, had charters that performed no better, and sometimes 
worse, than traditional public schools on standardized tests. 


State measurement systems should be broader than accountability systems. 

People often forget the distinction between measurement and accountability, but 
it is critical. Accountability systems create consequences for school performance: 
both rewards and penalties. Measurement systems provide information about those 
schools, without consequences attached. Both are necessary, but they are hardly 

identical. 


Accountability systems 
create consequences for 
school performance: both 
rewards and penalties. 
Measurement systems 
provide information about 
those schools without 
consequences attached. Both 
are necessary, but they are 
hardly identical. 


There are many things we measure 
about schools— and some we 
should begin measuring— because 
the information is useful. Some 
information helps parents make 
better choices, such as the number 
of Advanced Placement courses a 
high school offers. Other information 
also helps districts, authorizers, and 
schools learn what works and what 


doesn't— such as data on parental 
involvement. But statewide accountability systems should focus on outcomes we 
care about as a society, while leaving schools free to choose the methods that work 
best for their students and teachers. 
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Statewide accountability systems should put only half their weight on test scores. 

Since the mid-1 990s, our state accountability systems have been dominated by test 
scores. NCLB, which required states to hold their schools accountable for delivering 
"proficiency" on standardized tests, intensified the problem. The U.S. Department 
of Education gave all but a handful of states waivers to NCLB, to measure student 
growth as well as proficiency levels. But by 201 6, according to the Center for 
American Progress, the average state gave test scores (achievement and growth 
combined) 91 percent of the weight in elementary and middle school ratings and 
about 70-75 percent in high schools. Those numbers are far too high. 

Don't get me wrong: Test scores are important measures of success. Without them, 
how will we know if students are learning to read and do math? Beginning with sixth 
grade, we should also test writing (tested by at least five states by 201 5), science (at 
least 1 3 states), and the social sciences (a majority of states). If we don't measure 
these things, how will we know which schools are failing and need to be replaced? 

But that doesn't mean we should rely on test scores for three quarters or more of 
what matters. Testing experts agree that scores bounce around from year to year, 
so we need to be careful how we use them. Relying so heavily on test scores creates 
myriad problems. 

Our accountability systems should emphasize student growth more than 

achievement levels. Standardized tests usually measure the level at which a student 
performs, not the gains he or she has made over the past year. Yet we cannot judge 
the performance of a school or teacher without the latter data. If a school's students 
arrived four years behind grade level, on average, and two years later they are only 
one year behind grade level, is the school failing? Of course not. This may have been 
NCLB's biggest flaw: it required states to measure students' achievement levels but 
not their rate of growth. And despite their waivers, the majority of states still give 
greater weight to proficiency than growth in their measurement systems— and in 
making decisions about intervening in low performing schools. This puts schools 
with high percentages of low-income students at a huge disadvantage, because 
their test scores are lower. 

States should quit using "proficient" as the only target. Under NCLB, states were 
also required to measure the percentage of students who reached some cut score, 
usually labeled "basic" or "proficient." To make their schools look better, too many 
states lowered the proficiency bar. And as more than 40 experts in testing argued 
in a recent letter to Education Secretary John B. King Jr, the NCLB approach had 
myriad other flaws. It failed to distinguish between students who were right at the 
cut score and those who were far above it, for one. It gave no credit to gains made 
by students who remained below the cut score, no matter how large. And it created 
incentives for schools to concentrate on raising the scores of those just below the 
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cut score, the "bubble kids," as they became known. This led to neglect of both the 
lowest and highest achieving students. 

If we hold schools accountable for the growth of all their students, in contrast, they 
will be pressured to help those who are far behind while also providing challenging 
material for their advanced students. Hence states should use average scores or 
proficiency indexes, which reflect the percentage of students who reach each level of 
the scoring system. 

States should construct their systems as works in progress, to be adjusted as 
they learn how to objectively and reliably use student surveys and measure deeper 

learning, character skills, and other important aspects of school performance. We 

should experiment with all of these aspects of measurement— to learn what works, 
where the pitfalls lie, and how to overcome them. If some authorizers and districts 
want to include them in their charters and performance agreements— and the 
schools in guestion agree— states could also learn from their experience. 


AN IDEAL STATEWIDE ACCOUNTABILITY SYSTEM 

I would suggest that today's statewide rating systems— those applied to all 
schools— have five or six basic elements, weighted roughly as follows. (The balance 
between achievement and growth should depend on which method states use to 
measure growth; with some value-added methods, achievement and growth can 
be combined in one value-added score.) ESSA reguires that states also use English 
learner progress toward proficiency, but I have not specified a recommended weight 
because that should vary by school. In some, with many English language learners, it 
would be guite important; in others, with none, it would be unimportant. 

For high schools: 

Student academic achievement: 20 percent 

Student academic growth: 25 percent 

English learners' progress toward proficiency: variable 

Student engagement: 1 0 percent 

Qualitative school assessments by experts: 1 5 percent 

Student outcomes: 25 percent 
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Elementary and middle schools would use only five elements: 

Student academic achievement: 20 percent 
Student academic growth: 30 percent 
English learners' progress toward proficiency: variable 
Student engagement: 1 0-20 percent 
Qualitative school assessments by experts: 25-30 percent 
Indicators for each element could include the following: 

Academic achievement and growth: 

Test scores in math, ELA, writing, science, and the social sciences 

For English language learners, scores on tests designed to measure their 
progress in learning English 

PSAT, SAT, and/or ACT scores and/or state-approved international test scores 
Industry certifications 

Qualitative assessments 

Expert site visit assessments 

Student engagement 

Parent surveys 

Outcomes (for high schools) 

HS Graduation rate: 4 year, 5-7 year, and with GED (perhaps 2, 2, and 1 percent 
for each category, respectively) 

Quality of diploma, if states offer different diplomas 

Percent of graduates enrolling in college 

Percent of enrollees reguired to take remedial classes in college 

Percent of college enrollees persisting to second and third years 

Percent of two-year college enrollees completing a two-year degree or credential 

Percent of non-college-bound graduates employed, in training, or in the military 

Income levels for those employed full-time 
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Ideally, states should give various weights to each indicator and sum them to give a 
grade for each area, plus an overall grade. Some states use colors or phrases, such 
as "meets expectations," but parents understand grades on an A-F scale better. If we 
use grades for our children, we should have the courage to use them for the adults 
who run our schools. 

The kind of system I have described— with school performance agreements in 
addition to minimum statewide standards that include gualitative assessments 
by experts— may seem like a fantasy to those steeped in the world of NCLB. But 
they already exist. In Massachusetts, charter schools must meet minimal state 
standards, but their charters also include specific goals. When their charter is up 
for renewal, a team of experts visits for a day and a half and writes a gualitative 
assessment report on the school, which the state board uses in making its decision. 
In Washington, D.C., the Public Charter School Board uses a performance framework 
much like I have advocated, but individual charters include goals specific to the 
schools and reviews include multiple site visits to assess the guality of schools. 
Denver Public Schools does much the same thing with its charters. In New Orleans, 
charters must meet minimum state standards, and both the Recovery School 
District and the Orleans Parish School Board do on-site reviews every year, plus a 
high stakes review when charters are up for renewal. 

All three cities are among the fastest improving in the nation, and Massachusetts 
has one of our highest performing charter sectors. Are these not the kinds of results 
we want for all our children? 
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In the wake of Congress's passage of the 
Every Student Succeeds Act (ESSA) last 
December, states are busy revamping their 
federally-required systems to measure school 
guality and hold schools accountable for 
performance. But most are doing so using 
outdated assumptions: holdovers from the 
Industrial Era, when cookie-cutter public schools 
followed orders from central headquarters 
and students were assigned to the closest 
school. In today's world, that is no longer the 
norm. We are migrating toward systems made 
up of diverse, fairly autonomous schools of 
choice, some of them operated by independent 
organizations, as charter, contract, or innovation 
schools. Before revising their measurement and 
accountability systems, states need to rethink 
their assumptions. 

For instance, most states have assumed they 
should apply one standardized, statewide 
accountability system to almost all public 
schools. Most have also assumed that 
measurement and accountability systems are 
roughly the same thing, so the only aspects of 


performance they need to measure are those in 
their federally-required accountability systems. 
Under the old No Child Left Behind (NCLB) Act, 
most of those measures were standardized test 
scores, and what counted was the percentage 
of students scoring "proficient" or better. Finally, 
when schools repeatedly failed to meet such 
standards, most states assumed the proper 
response was some minor form of restructuring 
required by NCLB— perhaps a new principal, 
perhaps some new teachers, perhaps some new 
money. 

None of these assumptions will produce the 
schools our children need in the 21 st century. 
NCLB was an important step in its time, 
institutionalizing an expectation that states 
would hold schools accountable for the learning 
of all their children, including the poor, minorities, 
and those with special needs. But it relied 
on the fairly blunt tools used by most states 
back in 2001 : primarily achievement scores on 
standardized math and reading tests. And by 
setting the target at "proficiency" and ignoring 
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student growth, it penalized any school full of 
low-income children. In the intervening 1 5 years, 
more tools have become available— and even 
more are the subject of intense research today. 
Fortunately, the ESSA has opened the door to 
new approaches. 

THE ESSA'S NEW RULES 

Under the ESSA, states are still reguired to 
give greater weight to test scores than other 
indicators in their measurement systems, but 
they have significantly more leeway. The new 
law reguires states to measure 1 ) student 
performance in math and English language arts, 
or ELA, 2) a second academic indicator, such as 
growth in math and ELA, 3) the progress English 
language learners make toward proficiency in 
English, 4) graduation rates from high school, 
and 5) at least one other measure of school 
guality or student success. (States can reguire 
schools to measure other things as well, of 
course.) 

States have to publish the results (excluding 
number 3 above) for each school and for 
these subgroups at each school: students 
with disabilities, students from low-income 
families, students from major racial and ethnic 
groups, and English language learners. In rating 
schools, states must give "substantial weight" to 
categories 1 -4 above and "much greater weight" 
to those four combined than to number 5. Their 
rating systems must sum the scores in these 
categories to create one rating or score for each 
school. 

For Title 1 schools in the bottom five percent, 
plus high schools with four-year graduation 
rates below 67 percent, states must provide 
"comprehensive support and intervention"— 


though the act leaves it mostly up to the 
states to determine what that entails. 1 (Title 1 
schools are those in which at least 40 percent 
of students gualify for a free or reduced-price 
lunch.) If a subgroup at a school falls in the 
bottom five percent of scores for all students 
at Title 1 schools, the state must provide 
"targeted support and intervention" to the 
school. Unfortunately, the ESSA's reguirements 
for determining schools needing both forms of 
support are unworkable and need amendment 
or liberal interpretation by U.S. Department 
of Education regulations, which are still being 
finalized. 2 

W/e are migrating toward 
systems made up of diverse, 
fairly autonomous schools 
of choice, some of them 
operated by independent 
organizations, as charter, 
contract, or innovation 
schools. Before revising 
their measurement and 
accountability systems, 
states need to rethink their 
assumptions. 

States may set "alternate academic achievement 
standards" for students with the most significant 
cognitive disabilities and give them alternate 
tests, provided no more than one percent of 
all students in the state are assessed this 
way. This accommodation will thus apply to 
less than 1 0 percent of students who receive 
special education. States may also allow 
multiple student assessments through the year 
rather than one year-end test, and they may 
include student portfolios in the assessments. 

Up to seven states will be allowed to pilot 
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competency-based assessments and other 
innovations. 

ESSA requires that this framework be applied 
to all schools and 99 percent of students, but it 
leaves it largely up to states to define how they 
will use the measures to create consequences, 
within broad guidelines. In other words, it 
dictates at least part of a state's measurement 
and rating systems but not the rest of its 
accountability system. 

Some states may use their new freedom to avoid 
holding schools accountable for performance. 
This would be a huge mistake. Experience has 
shown that creating real consequences for 
school performance, including replacement 
by another school, is a key component of 
high-performing systems. In national charter 
school studies, the states and cities with high- 
performing charters are those that actively 
monitor quality and close or replace schools 
whose students are falling behind. Places where 


charter authorizers did not do this until recently, 
such as Arizona, Texas, and Ohio, had charters 
that performed no better, and sometimes worse, 
than traditional public schools on standardized 
tests. 3 

In revising their accountability systems, states 
should learn from the dramatic success of 
emerging 21 st century systems, such as those 
in New Orleans, Washington, D.C., Denver, 
Indianapolis, and Boston. New Orleans, where all 
but six public schools are charters, has improved 
faster than any other district in the state— and 
probably the nation as well. 4 Washington D.C., 
where 45 percent of students attend charters, 
has improved faster on the National Assessment 
of Educational Progress (NAEP) than all 20 other 
big cities that participate, with charters leading 
the way. 5 (New Orleans does not participate 
in NAEP). Denver Public Schools, which has 
embraced charters and added an equally large 
sector of "innovation schools," which it treats 
somewhat like charters, has moved from the 
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slowest academic growth among Colorado's 
20 largest cities in 2005 to the fastest since 
201 2. 6 Indianapolis's 37 charters, authorized 
by the mayor, vastly outperform its traditional 
public schools. 7 The traditional district has 
now embraced "innovation network schools," 
some of which are charters and all of which 
enjoy charter-like autonomy. And Boston's 
charter sector, which benefits from exemplary 
authorizing by the state board of education, is 
the highest performing in the nation, producing 
more than twice as much learning per year as 
Boston Public Schools. 8 

If we want 21 st century schools, we have to hold 
them accountable to 21 st century standards. 

For too long, we have defined and measured 
school quality in a way that encourages 
cookie-cutter schools, all focused on preparing 
students for tests. Instead, we need diverse 
schools that cultivate the joy of learning, 
engage their students in deep learning, and help 
them develop the "character skills"— such as 
conscientiousness and self-control— that lead to 
success in life. 

As I will argue below, we need: 

accountability systems that focus on more 
than minimal standards and treat different 
schools differently; 

• systems that make accountability real and 
powerful by replacing failed schools with 
proven models that fit students' needs; 

state measurement systems that are broader 
than their accountability systems; 

accountability systems that put only half 
their weight on test scores and emphasize 
student growth more than achievement 
levels; and 


; systems that are constructed as works in 
progress, to be adjusted as we learn how 
to objectively and reliably measure deeper 
learning, character skills, and other important 
aspects of school performance. 

ACCOUNTABILITY SYSTEMS THAT TREAT 
DIFFERENT SCHOOLS DIFFERENTLY 

The common belief that the same accountability 
system should be applied to every public school 
is outdated. In a traditional 20th century district, 
in which all schools operated in similar fashion 
and sought to educate all types of children, 
it may have made sense. But in 21 st century 
systems, with parents choosing between many 
diverse schools designed to educate different 
kinds of learners, it no longer does. 

It still makes sense for states to adopt minimum 
standards that will trigger consequences for 
most schools if they consistently fail to meet 
them. After all, we want every child to learn to 
read, reason, do basic math, write coherently, and 
gain some familiarity with science, technology, 
history, geography, and civics. If children are not 
learning these things, should we really be using 
taxpayers' dollars to fund their schools? 

We must always remember, though, that 
these are only minimum standards. Beyond 
them, states should encourage districts 
and authorizers to negotiate more specific 
performance goals and measures that reflect 
the missions of individual schools. If a school is 
designed to provide STEM (science, technology, 
engineering, and math) education, for instance, 
it should be judged on how well it does so. If it is 
designed to provide dual language immersion, or 
career and technical education, its performance 
measures should reflect those goals. 
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Should alternative schools for dropouts 
and overage students be held to the same 
standards as ordinary high schools? How about 
schools designed specifically for students with 
disabilities? Or schools for students returning 
from the criminal justice system? Obviously, we 
need different standards for such "alternative" 
schools. While states should measure and rate 
these schools, just as they do all others, they 
should not impose consequences. Instead, 
districts or authorizers should hold alternative 
schools accountable for performance goals 
and measures they have negotiated with each 
individual school— or group of schools, if more 
than one use the same model. 

We must be careful, however, that districts 
and authorizers don't label too many schools 
"alternative" and dump their worst students 
in them to make the rest of their schools look 
good. State departments of education should 
take a close look at any district or authorizer 
that has more than 1 0 percent of its students in 
alternative schools. 

After all, we want every 
child to learn to read, 
reason, do basic math, 
write coherently, and gain 
some familiarity with 
science, technology, history, 
geography, and civics. 

Holding each school accountable to its own 
standards has always been the heart of the 
charter model— though not all states are 
faithful to that model. A charter should be a 
performance contract, which spells out what 
the school intends to accomplish, how it will be 
measured, and what will happen if the school 


fails to achieve its goals. Such agreements can 
reflect the unique characteristics of a school 
far better than statewide standards can. And 
when schools are held accountable to their own 
goals, negotiated with their own authorizer or 
district, their leaders and staff members are 
far more likely to embrace responsibility for 
accomplishing them. 

Such performance agreements can also 
motivate every school, whereas minimum 
state standards have little effect on schools 
whose students regularly score above the 
minimums. Our accountability systems should 
motivate every member of every school to seek 
improvement in student outcomes— even those 
who work in schools for gifted students. 

Finally, by crafting different performance goals 
for different kinds of schools, we will encourage 
people to open innovative schools designed to 
serve different kinds of students. When we apply 
standardized accountability to all schools, we do 
the opposite— and we get far less innovation. 

Consider University Preparatory Academies 
in inner-city Detroit. When founder Doug 
Ross opened his first high school, in 2003, 
he adopted a model heavily influenced by Big 
Picture Learning— particularly the MET School, 
based in Providence, Rhode Island, Big Picture's 
first. Aware that his biggest challenge was 
creating motivation for college among inner- 
city, African-American teenagers who had rarely 
met anyone, other than their teachers, who had 
ever been to college, Ross decided that every 
high school student would spend two days 
a week in internships with local businesses, 
public agencies, or non-profits. It worked: when 
students saw African-American adults who had 
graduated from college and had good jobs, nice 
houses, and nice cars, a light bulb often went off. 
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If we want 21st century 
schools, we have to hold 
them accountable to 
21st century standards. 
For too long, we have 
defined and measured 
school quality in a 
way that encourages 
cookie-cutter schools, 
all focused on preparing 
students for tests. 
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On top of that, students discovered they could 
contribute in meaningful ways, which boosted 
their confidence, and some of them fell in love 
with particular fields of work. But when Michigan 
adopted statewide standards and imposed them 
on all schools, prompted by NCLB, University 
Prep had to cut internships back to half a day 
a week, to ensure that students covered all the 
material on state tests. 


MAKE ACCOUNTABILITY REAL BY REPLACING 
FAILED SCHOOLS WITH PROVEN MODELS 

Another mistake NCLB made was allowing 
states and districts to impose consequences 
for failure that had few teeth. The ESSA could 
well make the problem worse. But, if states want 
to improve student learning, real accountability 
is one of their most powerful tools. As noted 
above, experience has proven the most effective 
way to turn around a failing school is to replace 
it— to bring in an entirely new team with a strong 
track record and a new vision for the school. 

This helps motivate staff at every school: when 
everyone knows their jobs will disappear if 
students are not learning enough, they usually 
work together to make it happen, no matter the 
obstacles. This is one of the key things that 
separates strong charter states from weak 
ones . 9 

When schools fail to meet state standards or 
their more specific performance goals for two or 
three years, states should provide resources to 
help them hire the help they need to turn things 
around, as the ESSA suggests. If schools cannot 
do so within two years, however, authorizers and 
districts should replace them with more effective 
school operators, as they do in New Orleans, 

D.C., Denver, and other places. Before pulling the 
trigger, they should give the schools a chance to 


appeal— to provide compelling evidence that they 
are actually succeeding, given the students they 
educate. Sometimes a school may have a high 
percentage of children with serious disabilities, 
or a high percentage of former dropouts or 
homeless children. Sometimes a middle school 
or high school may be helping children who 
arrived three or four years behind grade level 
achieve decent academic growth, while missing 
minimum state standards. 

In some cases, if it's a close call, the district 
or authorizer might want to renew the school 
for only a year or two, to give it time to solve 
whatever problems exist. If the schools’ 
argument is not compelling, however— or if it 
fails to turn things around during its extension— 
authorizers should replace the failing school 
with a better one. 


ACCOUNTABILITY IS NOT THE SAME AS 
MEASUREMENT 

People often forget the distinction between 
measurement and accountability, but it is critical. 
Accountability systems create consequences for 
school performance: both rewards and penalties. 
Measurement systems provide information 
about those schools, without consequences 
attached. Both are necessary, but they are hardly 
identical. 

There are at least three kinds of accountability. 
The first is well known: measurement by 
the state and consequences of some kind 
if performance is exceptional or falls below 
minimum standards for several years in a 
row. The second is also relatively familiar: 
consequences (either positive or negative) 
imposed by parents who react to performance 
information by changing their children's schools, 
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taking public dollars with them. The third is 
less powerful but also important: the pride or 
embarrassment administrators and teachers 
feel when their schools are shown to be of high 
or low quality. All three can stimulate school 
leaders and their staffs to work together to 
improve student outcomes. 

Accountability systems 
should focus on what we 
as a society most want our 
schools to accomplish: 
real-world outcomes such 
as graduation rates, college- 
going and employment rates, 
and student acquisition of 
knowledge and skills. 

There are many things we measure about 
schools— and some we should begin 
measuring— because the information is useful. 
Some information helps parents make better 
choices, such as the number of Advanced 
Placement courses a high school offers. Other 
information also helps districts, authorizers, 
and schools learn what works and what 
doesn't— such as data on parental involvement, 
or student surveys about teacher and school 
quality, or student and teacher assessments 
of character skills such as self-control and 
conscientiousness. Such data may even play a 
role in school, district, and authorizer decisions 
about where to invest and what policies to 
adopt. But there are good reasons to keep it out 
of an accountability system. 

Accountability systems should focus on what 
we as a society most want our schools to 
accomplish: real-world outcomes such as 
graduation rates, college-going and employment 
rates, and student acquisition of knowledge 


and skills. But to help those who manage 
schools, districts, and charter networks— as 
well as parents facing choices— we also need 
information about inputs and outputs, such 
as attendance rates, teacher absenteeism and 
turnover, student-teacher ratios, numbers of 
Advanced Placement courses, and so on. Indeed, 
every school should track its own "balanced 
scorecard," including data about student results, 
employees' views and experiences (morale, 
absenteeism, turnover, etc.), operational issues 
(spending, learning time, productivity, etc.), and 
customers' views (parental satisfaction, student 
engagement, and the like). Principals and 
teachers should examine such data in regular 
group sessions and use it to make changes that 
will improve performance. 

If schools or networks of schools choose 
to use data about such things for internal 
accountability, that is their prerogative. Districts 
and authorizers may want to include any of 
these measures in performance agreements 
with particular schools, where it makes sense. 
But states should limit what they hold all schools 
accountable for to a handful of key outcomes 
that truly matter in all students' lives. And they 
should give schools the freedom to figure out the 
best methods to achieve those outcomes, given 
the particular students they educate. This is the 
formula that has worked so well in our fastest 
improving school systems. 

Plere are a few examples of data that states 
should require districts and authorizers 
to measure but not include in statewide 
accountability or rating systems: 

Student demand, For most schools, demand 
reflects parental judgments about the school's 
quality. But some schools are designed for 
specialized populations— pregnant students, 
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dropouts, overage students, or special education 
students. It would be silly to punish such schools 
because demand was low or dropping, since 
lower pregnancy or dropout rates might be a 
sign of success, not failure, for the city or school 
system. However, districts and authorizers 
should feel free to include negotiated goals 
about demand in their performance agreements 
with individual schools, where it makes sense. 

Retention rates. Some districts and authorizers 
measure the rate at which schools retain 
students, but this is another number that should 
not be attached to ratings or consequences. 
Some demanding schools lose students 
because nearby schools are easier. It would be 
insane to punish them for that. 

Discipline rates. The same is true of rates of 
discipline. How could a statewide standard ever 
apply to every public school? How could a state 
agency judge whether a school was using the 
ideal type and amount of discipline? As charter 
authorizers in D.C. and elsewhere have shown, 
publishing data on discipline rates is useful— 
to keep schools honest, to encourage them 
to deal with the trauma that often underlies 
student misbehavior, and to nudge them to use 
methods such as restorative justice rather than 
suspensions and expulsions. But we need to 
leave judgments about discipline up to the people 
who run schools. Students in one school may 
disrupt class frequently, so high rates of discipline 
may be required to ensure that students can 
learn uninterrupted. Students in another school 
may rarely disrupt class and thus need little 
disciplinary action. Any effort to punish schools 
for high rates would undermine their ability to 
deal with the realities in their classrooms. 

College-level courses. Some states, districts, 
and authorizers also give credit in their 


performance frameworks for the number of 
Advanced Placement (AP) classes, International 
Baccalaureate (IB) programs, and/or high school 
students taking college classes through dual 
enrollment. Again, this is good information to 
measure and publicize, because it will help 
parents and students make informed choices. But 
how can anyone say that all schools should offer 
more such opportunities? Big Picture Learning 
Schools have concluded that internships in local 
businesses, nonprofits, and government offices 
are more valuable for their students. Given their 
impressive outcomes, they are surely correct. 

It would be silly to create incentives for them 
to limit internships and offer more AP courses. 
Different schools, with different kinds of students, 
need different methods. Statewide accountability 
systems should focus on outcomes and leave the 
choice of methods to schools. 


STATES SHOULD NOT RELY SO HEAVILY ON 
TEST SCORES 

Since the mid-1 990s, our state accountability 
systems have been dominated by test scores. 
NCLB, which required states to hold their 
schools accountable for delivering "proficiency" 
on standardized tests, intensified the problem. 
President Obama's Department of Education gave 
all but a handful of states waivers to NCLB, to 
measure student growth as well as proficiency 
levels. But by 201 6, according to a study by 
the Center for American Progress, the average 
state gave test scores (achievement and growth 
combined) 91 percent of the weight in elementary 
and middle school ratings and about 70-75 
percent in high schools. 10 Those numbers are far 
too high. 

Don't get me wrong: Test scores are important 
measures of success. Without them, how will 
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we know if students are learning to read and 
do math? Beginning with 6th grade, we should 
also test writing (tested by at least five states by 
201 5), science (at least 1 3 states), and the social 
sciences (a majority of states). 11 If we don't 
measure these things, how will we know which 
schools are failing and need to be replaced? How 
will we make sure most poor children don't end 
up in dropout factories? 

But that doesn't mean we should rely on test 
scores for three guarters or more of what 
matters. Testing experts agree that scores 
bounce around from year to year, so we need to 
be careful how we use them. 

Relying so heavily on test scores creates myriad 
problems. One of the most important, articulated 
by social scientist Donald Campbell in 1976, 
has become known as Campbell's Law: "The 
more any guantitative social indicator is used 
for social decision making, the more subject it 
will be to corruption pressures and the more 
apt it will be to distort and corrupt the social 
processes it is intended to monitor." We have 
seen both corruption and distortion due to 
NCLB's accountability system: adults cheating 
on standardized tests and schools concentrating 
on rote learning to drive up test scores, 
undermining children's natural love of learning in 
the process. 

That does not mean we should stop 
standardized testing. Campbell was not 
discouraging measurement; he was warning us 
not to rely on a single measure of guality. "Many 
commentators, including myself, assume that 
the use of multiple indicators, all recognized as 
imperfect, will alleviate the problem," 12 he added 
in the very same essay. 

In addition, we all know people who perform 
well in life and work but who do not test well— 


because of stress, learning disabilities, trauma, 
or myriad other issues. As one teacher told David 
Kirp, author of Improbable Scholars, about Union 
City's schools in New Jersey, "The expectation 
is that 1 0-year-olds can write five paragraphs in 
half an hour, solve complicated math problems 
and have a wealth of knowledge about science, 
and do it all entirely on their own. But these 
youngsters freeze up under stress— one word on 
a guestion can throw them off...." 13 

One of my daughters taught in a K-8 school in 
New Orleans, where students had to achieve 
certain test scores to move from 4th to 5th 
grade and 8th to 9th. The school gave teachers 
rubber gloves along with the tests, because 
students often threw up from the stress. Fights 
were always more common during testing week, 
because students wanted to be suspended and 
sent home so they could avoid the test. 

Yet another problem is that standardized tests 
often give misleading signals about students 
who are still learning English. Kristina Rizga, 
author of Mission High, describes a student at 
San Francisco's Mission High School named 
Maria, from El Salvador. For years, Maria failed 
standardized tests because she still struggled 
with English. But, by the time she applied for 
college, she had mastered the language, and she 
won two scholarships and was accepted at five, 
including the University of California at Davis. 

Some studies find a correlation between good 
test scores and success in adult life, but others 
find no connection, and the guestion is still hotly 
debated. 14 In 201 2, Education Sector compared 
college enrollment rates at 21 randomly selected 
California high schools with their state Academic 
Performance Index (API) scores, which were 
based almost entirely on test scores. "In the 
sample of 'typical' schools," they found, "most 


P20 


PP1 


CREATING MEASUREMENT AND ACCOUNTABILITY SYSTEMS FOR 21 ST CENTUffiPmTOlDIISE 


high-scoring API schools also tend to have 
higher postsecondary enrollment and most low- 
scoring API schools have lower postsecondary 
enrollment. But, in the sample of high-poverty 
schools, the relationship between high API 
scores and high college enrollment rates all but 
disappears." 15 

They described San Francisco's June Jordan 
School for Eguity, "a 250-pupil school founded 
in 2003 to serve some of the city's poorest 
neighborhoods," which had a "dismal" API 
in 201 0 of only 568 out of 1 000. (The state 
considered 800 a good score.) 

But also that year, June Jordan ranked 
second among San Francisco high 
schools in the percentage of students 
eligible for the UC/CSU system, behind 
only the prestigious, admissions-based 
Lowell High School. Among its 2009 
graduates, 70 percent enrolled in college 
overall, and 49 percent enrolled in four- 
year colleges-higher enrollment rates 
than the district average. The graduates 
are also persisting in college. Fully 7 00 
percent of its 2007 graduates who entered 
two-year colleges re-enrolled for a second 
year, and 85 percent of its 2008 graduates 
did so. For graduates entering four-year 
colleges, the figures are 7 00 percent for 
2007 grads and 97 percent for 2008. 

Standardized tests can also push schools to 
concentrate more on memorization than on 
deeper learning. But tests can also measure 
aspects of deeper learning, and those developed 
to measure the Common Core standards have 
improved the situation. 16 Most experts believe 
the Common Core tests move in the right 
direction, though some argue they still need 
more guestions that show how deeply students 


can apply, analyze, and evaluate what 
'they know. 17 

Schools that focus deliberately on deeper 
learning often sacrifice their standardized 
test scores, because they don't prepare their 
students for such tests. For instance, a dozen 
teacher-run charter schools in Minnesota- 
operated or assisted by a teacher cooperative 
called EdVisions— use project-based learning 
to maximize student engagement. According 
to a 201 0 study, they had lower scores on 
standardized tests than the state average 
but higher ACT scores (23.6 compared to a 
national average of 21 .2) and SAT scores (1 749 
compared to a national average of 1 51 8). More 
than 82 percent of their graduates entered two- 
or four-year colleges, compared to a national 
average of 68 percent. 18 

Standardized tests also fail to measure "non- 
cognitive" or "character" skills that are important 
for future success, such as self-control, 
conscientiousness, and the ability to work well 
with others. Former Minnesota Governor Rudy 
Perpich, who, in the 1980s, brought public school 
choice to Minnesota— and hence to America- 
used to say, "I've seen too many people who 
passed tests and failed life. And too many people 
who failed tests and passed life." 19 

Finally, too heavy a reliance on standardized 
testing in accountability systems can discourage 
people from creating schools for particularly 
challenging students, such as dropouts, children 
with disabilities, those convicted of crimes, 
or those who don't speak English. They also 
discourage schools from trying new methods— 
such as project-based learning, student 
internships, or career and technical education— 
that might deepen learning but hurt test scores. 
We need more innovation in our schools, not 
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less; we need to encourage people to start 
schools that are unique, aimed at students who 
do not fit well in cookie-cutter schools. 

The American people understand this. Every 
year Gallup and Phi Delta Kappa collaborate 
on a survey to measure public opinion about 
education. In 201 5, 64 percent of those surveyed 
agreed there was too much emphasis on 
standardized testing in their public schools. 
When asked about different measures of school 
effectiveness, almost four in five said "how 
engaged students are with their classwork” and 
"the percentage of students who feel hopeful 
about their future" were "very important," and 
69 percent added high school graduation rates. 
More than twice as many said the percentage 
of graduates going on to college or jobs was 
"very important" than said the same about 
standardized test scores. But 48 percent agreed 
tests scores were "somewhat important" in 
improving schools, while another 19 percent 
said they were "very important.” Less than a 


third said they were not important or "not very 
important." 20 

In other words, Americans see the value in 
standardized testing, but they see more value in 
measuring graduation rates, employment rates, 
and student engagement and attitudes. 

There is wisdom here. I believe that 
standardized test scores (including college 
readiness tests such as the ACT and SAT 
exams) should be given roughly half the weight 
in statewide measurement and accountability 
systems, depending on the school level. (I 
will suggest specific percentages below, after 
discussing some of the alternatives.) But 
they should be balanced by other important 
measures. 


GIVE MORE WEIGHT TO ACADEMIC GROWTH 
THAN ACHIEVEMENT LEVELS 

Standardized tests usually measure the level 
at which a student performs, not the gains he 
or she has made over the past year. Yet we 
cannot judge the performance of a school or 
teacher without the latter data. If a school's 
students arrived four years behind grade level, 
on average, and two years later they are only one 
year behind grade level, is the school failing? Of 
course not. This may have been NCLB's biggest 
flaw: it required states to measure students' 
achievement levels but not their rate of growth. 
This put schools with high percentages of 
low-income students at a huge disadvantage, 
because their test scores were so much lower. 

Under waivers granted by the Bush and Obama 
administrations, all but five states added growth 
measures in English and math. But the majority 
of states still give greater weight to proficiency 
than growth in their measurement systems— and 
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in making decisions about intervening in low- 
performing schools. 21 

There are many different ways to measure 
student growth. Jurisdictions using student 
growth percentiles— a popular method first 
developed by Colorado— need to balance them 
with egual weight for proficiency. Because 
these systems only compare students to those 
who have had similar test scores in the past, a 
student can score above average while falling 
further behind grade level, because his "peer 
group" performs so poorly. 22 

Another popular option is a "value-added 
model," which attempts to isolate the 
contribution a school makes to student gains 
by controlling for student characteristics such 
as socioeconomic status— thus putting all 
schools on an egual playing field. One popular 
value-added model is the pioneering method 
developed by William L. Sanders in the early 
1 990s for Tennessee, the Tennessee Value- 
Added Assessment System (TVAAS), a version 
of which the firm SAS now markets as the 
"EVAAS Multivariate Response Model." 23 Another 
is a two-step value-added model advocated 
by a group of economists at the University of 
Missouri. This method first performs a value- 
added analysis, then compares each school 
to schools and teachers serving students with 
similar characteristics— income levels, race, 
English language ability, and so on. Whereas 
student growth percentiles and some single- 
step value-added models produce results in 
which schools with higher-income students 
have an advantage, this two-step model 
appears to eliminate any such advantage. 24 

There are a few other alternatives used by 
states, 25 but my purpose here is not to analyze 
which methods are superior. It is simply to argue 


that any academic performance scores must 
focus more on growth than on proficiency levels. 

STOP FOCUSING SO HEAVILY ON THE 
PERCENTAGE OF STUDENTS WHO ARE 
PROFICIENT OR ABOVE 

Under NCLB, states were required to measure 
the percentage of students who reached some 
cut score, usually labeled "basic" or "proficient." 
To make their schools look better, too many 
states lowered the proficiency bar. And, as more 
than 40 experts in testing argued in a recent 
letter to Education Secretary John B. King Jr, 
the NCLB approach had myriad other flaws. It 
failed to distinguish between students who were 
right at the cut score and those who were far 
above it, for one. It gave no credit to gains made 
by students who remained below the cut score, 
no matter how large. And it created incentives 
for schools to concentrate on raising the scores 
of those just below the cut score— the "bubble 
kids," as they became known. 

Focusing only on proficiency 
also led to neglect of both 
the lowest and highest 
achieving students. 

Focusing only on proficiency also led to neglect 
of both the lowest and highest achieving 
students. In contrast, if we hold schools 
accountable for the growth of all their students, 
they will be pressured to help those who are 
far behind while also providing challenging 
material for their advanced students. There are 
a surprising number of the latter: A recent study 
by the Institute for Education Policy found that 
20-40 percent of elementary and middle school 
students perform at least one grade level above 
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their grade in reading, while 1 1 -30 percent do 
so in math. That eguates to 1 .4 to two million 
students in California alone. 26 

NCLB gave schools incentives to ignore such 
students, and some of them did. According to 
the Fordham Institute: 

Research has demonstrated that students 
just below the bar were most likely to 
make large gains in the NCLB era, while 
high achievers made lesser gains. Those 
most victimized by this regime were high- 
achieving poor and minority students- 
kids who were dependent on the school 
system to cultivate their potential and 
accelerate their achievement. (Equally 
able youngsters from middle-class 
circumstances have other people and 
educational resources to keep them 
moving forward .) 27 

In their letter, the experts urged the Education 
Department to write regulations allowing states 
to use either of two approaches under ESSA: 

1) average scores for each grade and subject, 
or 2) proficiency indexes, which would reflect 
the percentage of students who reached each 
level in the scoring system— not just one level, 
"proficiency." Either would give a truer picture of 
school performance than proficiency rates. 28 

The Fordham Institute recommended the latter, 
an "achievement index that gives schools partial 
credit for getting students to 'basic,' full credit 
for getting students to 'proficient,' and additional 
credit for getting students to 'advanced.'" 29 
It noted that eight states already used this 
kind of index, under NCLB waivers. 30 It also 
recommended that states add "high-achieving 
students" as a subgroup on which they report 
results, just like the other subgroups reguired by 
ESSA. 


Some states also use growth models that only 
include "growth to proficiency"— hence excluding 
all students who are already above proficient 
levels. 31 Schools that know they are being judged 
this way tend to ignore the needs of those 
advanced students. Leslie Jacobs, a veteran 
of almost two decades on New Orleans's and 
Louisiana's school boards, puts it this way: "If 
you don't measure it, it doesn't count. And if kids 
don't count, they will be ignored." 

WHAT ELSE SHOULD STATES USE IN 
ACCOUNTABILITY SYSTEMS? 

Graduation rates. NCLB reguired states to 
measure graduation rates, so most use a four- 
year adjusted cohort graduation rate, which 
includes all those who start 9th grade at a high 
school but subtracts those who transfer to 
another high school. Some states have included 
five-year, six-year, and even seven-year rates. 32 
This is wise, because we want high schools to 
work hard to help students graduate, even if 
it takes longer than normal. At least a third of 
those who enroll in college are not ready— they 
have to take remedial courses, and many of 
them later drop out. 33 Some of the nation's best 
charter schools— which want their graduates to 
be ready for college— reguire students to do an 
extra year if they have not fulfilled all graduation 
reguirements in four years. We should reward 
such behavior, not punish it. Extended year 
graduations should receive egual weight with 
four-year graduations; there should be no 
assumption that one is better than another. 

Some states award special diplomas to 
recognize high achievement— "distinguished 
achievement" programs in Texas, 34 for example, 
or "regents diplomas with advanced designation 
with honors" in New York. 35 Some of them award 
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points in their performance indexes for the 
number of high-achievement diplomas. 36 Florida, 
Indiana, Louisiana, Maryland, New Mexico, 
Oklahoma, and New York City also give credit 
for students who have earned industry-based 
certifications, to incent schools to make such 
options available. 37 Both are excellent ideas. 

College entrance and persistence rates. 

Ultimately, the most important factor in judging 
schools should be how they prepare students for 
success in life. Graduating from college is often 
an important milestone on that path. Indeed, the 
value of a college degree has increased steadily 
in the Information Age, while the value of a high 
school degree has fallen, as the graph below 
shows. (The graph focuses on male earnings, 


but the data for female workers looks much 
the same.) For most, college has become the 
gateway to a middle-class career. 

Flence many charter authorizers and a few 
states include the percentage of graduates who 
enter college in their measurement systems. 38 
In addition, Denver measures the percentage of 
college entrants who must take remedial classes, 
an important indicator everyone should use. 39 
The percentage who move on to a second year, 
the percentage entering a two-year college who 
earn a two-year degree or credential, and the 
percentage moving on to a third year are also 
worth measuring. We should give high schools 
an incentive to actively help their graduates make 
it through college, as many of the best charters 


FIGURE 1 : Change in Real Wage Levels of Full-Time Male Workers by Education, 1 963-201 2 
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do. Unfortunately, using college graduation rates 
is probably unfair, because college graduation 
occurs so long after high school graduation. A 
school might make huge strides in helping its 
graduates prepare for and get through college, 
but the improvement would take four to six years 
to show up in graduation rates. 

Training and employment rates. There are 
many skilled positions in our economy that 
do not reguire college degrees; instead, they 
reguire some technical training, whether through 
an apprenticeship, a training program, or a 
community college. And many students have 
no desire to go to college. For some, finding 
a full-time job (or joining the military) after 
high school is an indicator of success. States 
should measure employment and training rates 
(including military service) for graduates who 
do not go on to college and include them in 
their accountability systems. They should also 
include the income levels of recent graduates 
(from the past two years) who are employed full 
time. They should be careful not to put so much 
weight on college enrollment that they give 
high schools an incentive to ignore career and 
technical education that leads to further training, 
apprenticeships, and jobs. These outcomes 
should be given substantial weight as well. 

Qualitative assessments by experts. In England, 
small teams of experts, many of them former 
school leaders and teachers, visit each school 
roughly every three years, with two days notice, 
and spend two to three days gauging its guality. 
They sit in on classes; examine student work; 
talk with groups of students, staff, and members 
of the governing board; look over documents, 
records; and test scores; review parent surveys; 
solicit written input from parents; and often 
meet with parents. Then they publish reports— 
distributed to all parents— full of gualitative 


judgments, rating areas such as "guality of 
teaching, learning and assessment," "outcomes 
for pupils," "early years provision," "effectiveness 
of leadership and management," and "personal 
development, behavior and welfare." There are 
four possible ratings: outstanding, good, reguires 
improvement, and inadeguate. The work is 
overseen by the Office for Standards in Education, 
Children's Services and Skills, an independent 
government agency created in 1 992. 

New York City adopted a similar model to 
evaluate its own schools; Vermont is doing 
the same; Massachusetts, Indianapolis, and 
others use visits based on the English model 
in reviewing their charter schools; and large 
charter networks such as KIPP have done 
the same. 40 This approach can yield valuable 
information about school guality that test scores 
don't reveal— particularly about school culture 
and aspects of deeper learning, such as critical 
thinking, problem solving, researching, and 
speaking skills. 

Obviously, inspections such as this are more 
expensive than standardized tests, though 
they don't have to be done every year. If we 
want a balanced set of guality measures that 
reflect the whole child's experience, however, 
they are indispensable. We already spend a lot 
accrediting most public schools every six to 
ten years (depending on the region), paid for by 
the schools. It is a voluntary process that relies 
heavily on a "self-study" by the school, and, 
in some regions, it focuses less on academic 
outcomes than on facilities and process 
(guidance services, curriculum, instructional 
model, etc.). Whether school leaders implement 
the accrediting agency's recommendations 
depends entirely on them, and accrediting 
agencies rarely refuse a public school 
accreditation. As one former superintendent 
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who has been through multiple accreditations 
and also done charter reviews put it, "If we're not 
keeping score, what are we doing?" 41 

Ted Sizer, the late headmaster of the elite Phillips 
Academy, author of Horace's Compromise, 
founder of the Coalition of Essential Schools, 
and chair of the Brown University Education 
Department, told me that the assessment 
done when he ran a charter school, at the end 
of his career, was far more valuable than any 
accreditation process he had ever been through. 
In some regions, what we spend on accreditation 
would be more productively spent on an English- 
style assessment of each school every three 
years. If it costs more than accreditation every 
six years, which is unlikely, the investment would 
still be worthwhile. Accountability is an arena 
in which we should never be penny wise and 
pound foolish. 

In large states, the scale of these qualitative 
assessments might require be phased in over 
multiple years. But organizations already exist 
that know how to do them, and there are plenty 
of retired teachers and administrators who 
would be happy for the part-time work. With 53 
million people, England is far larger than any 
state, and the English government has managed 
just fine for decades. New York City is larger 
than most states, and it has done likewise. 

Since states are highly unlikely to embrace this 
method all at once, there would rarely be more 
than one large state launching such a system at 
any given time. 

Student engagement, measured by parent 

surveys. In most industries, customer 
satisfaction is a key indicator of quality. 

Education is no exception: both parents and 
students have important perspectives on the 
quality of their schools. There is some risk in 


using student surveys, as I will discuss below. 
But there is less risk with parent surveys, 
because parents are more likely to express 
their true feelings about their children's schools 
and less likely to comply with principals' and 
teachers' wishes. 

Attendance rates are another measure of 
student engagement, but they are easily 
manipulated by schools and difficult to audit 
effectively. 

IMPORTANT MEASURES NOT YET READY FOR 
PRIMETIME 

Many critics of standardized testing rightly point 
out that tests cannot capture many valuable 
aspects of student learning. For instance, a good 
test can capture some aspects of what people 
call deeper learning, but not all. Some schools 
and districts use other methods, but they are 
difficult to standardize when applied to many 
schools. For these and the two other forms 
of assessment below, we should experiment, 
moving as fast as possible to develop reliable, 
objective measures. States should treat their 
measurement and accountability systems as 
works in progress, to be improved as we learn 
more. 

Qualitative assessments of student 

performance. New Hampshire is gradually 
moving to system in which students advance 
by proving their competence in a subject matter, 
rather than putting in seat time and passing 
courses. In 201 5, the state received a federal 
waiver to work with eight school districts to 
develop performance tasks to assess student 
learning, in place of standardized tests. The pilot, 
the Performance Assessment of Competency 
Education (PACE), is designed to develop 
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assessments more in line with competency- 
based education. Some grades take the state's 
standardized (Smarter Balanced) tests, but 
others are assessed with multi-step tasks that 
seek to measure deeper learning. For instance, 
geometry students at Spaulding High School 
were asked to design two water towers that 
would each hold about 45,000 cubic feet of 
water— one a simple solid and one a compound 
solid— with the least amount of construction 
materials possible, then to write a proposal 
recommending the best approach. 42 

States should treat 
their measurement and 
accountability systems as 
works in progress, to be 
improved as we learn more. 

Deeper learning assessments are common 
in other developed countries, but they're just 
beginning to spread in the U.S. 43 The Council 
of Chief State School Officers' Innovation Lab 
Network is working with schools and districts 
in a dozen states to assess Common Core 
skills through "performance-based measures 
of deeper learning." 44 And the New York 
Performance Standards consortium, a group 
of 38 small high schools that focus on deeper 
learning, has a state waiver allowing them to 
use continuous, project-based assessments. 
Students write essays and research papers, 
solve math problems, do science experiments, 
and orally present their work to external 
assessors. All but two of the schools are in New 
York City. With demographics similar to the rest 
of the city's high schools, their dropout rates 
are half what the city's are and their college 
acceptance rates are almost 30 percentage 
points higher. 45 


Unfortunately, all these approaches involve 
subjective assessment of student work, and 
it is difficult to ensure that such assessments 
are standardized across thousands of schools. 
Other nations, including the Netherlands, 
Singapore, and parts of Australia, train teachers 
and education professors who serve on 
assessment panels, using common rubrics. 

The results are audited to ensure that roughly 
the same standards are being applied in all 
schools. 46 If research on these efforts can find 
a method that proves objective and reliable 
when used statewide, it would enrich our 
accountability systems. For now, though, all we 
can do is encourage such research. 

Student surveys. In the U.S, most colleges 
use student surveys as part of professors' 
evaluations, but K-l 2 schools rarely do. Yet, 
when they respond to surveys honestly, students 
have proven to be very accurate barometers 
of teacher and school guality. Starting in 201 0, 
the Bill and Melinda Gates Foundation funded 
research involving 3,000 teachers in six urban 
school districts to identify the most accurate 
measures of teaching guality. They analyzed 
test scores, videotaped 20,000 teacher lessons, 
studied thousands of classroom observations 
of teaching, and implemented dozens of student 
surveys. They asked tens of thousands of 
students to respond to statements on a survey 
developed through a decade of research by 
Dr. Ronald Ferguson of Flarvard University, 
in collaboration with teachers, students, and 
colleagues. 47 The survey asked students to 
agree or disagree with statements like, "My 
teacher knows when the class understands, and 
when we do not"; "When I turn in my work, my 
teacher gives me useful feedback that helps me 
improve"; and "In this class, the teacher accepts 
nothing less than our full effort." Students' 
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answers correlated with how much they had 
learned (measured by growth on standardized 
tests). Indeed, they were more reliable than 
ratings by trained observers who watched videos 
of classrooms. And they bounced around less 
than test scores, over time. 48 

At least 1 00 districts and a thousand schools 
are now using the Tripod survey, as Ferguson 
and his company call it, as part of their teacher 
evaluation systems. Some are attaching 
conseguences. In Pittsburgh, PA., for instance, 
survey results collected multiple times over 
two years account for 1 5 percent of a teacher's 
overall evaluation score. 49 Tripod Education 
Partners has also developed and used student 
surveys to deliver feedback on "school climate," 
"student engagement,” "peer support," and 
character skills such as "conscientiousness." 

If surveys are used to create real conseguences, 
however, schools and teachers may push 
students to emphasize the positive. Some 


students may also use their answers to punish 
teachers who are more demanding or tougher 
graders— a phenomenon well known to college 
professors. According to Ferguson's partner, Rob 
Ramsdell, Tripod finds very consistent patterns 
across classrooms, even in districts that give 
1 5 percent of the weight in teacher evaluations 
to surveys. If students were not answering 
honestly, he says, this would not be the case. 

But he agrees we can't rule out the possibility 
of such problems. If you've ever bought a new 
car and had the salesperson tell you to expect 
a phone call from the company asking you to 
rate his performance— and that his entire bonus 
rests on your answer— you're familiar with one 
potential problem. 

With standardized tests, states analyze data 
and investigate instances where the unexpected 
happens, to detect cheating. They could do the 
same with student surveys. Normally, survey 
results are fairly consistent over time, and 
they correlate with students' academic gains 
on tests. When surveys suddenly show a big 
change or more positive results than tests, 
states could investigate. 

For now, every school should use feedback 
from its customers— its students and their 
families— to improve its performance. If the data 
is trustworthy, there are few better measures 
of school guality. States should reguire that the 
data be collected and distributed to schools 
and parents, to help them choose appropriate 
schools for their children. But, given the risk 
of schools influencing student responses— or 
students punishing demanding teachers— we 
should not yet include student surveys in 
statewide accountability systems. Instead states 
should experiment, to learn what works, where 
the pitfalls lie, and how to overcome them. If 
some authorizers or districts want to include 


P29 


PP1 


CREATING MEASUREMENT AND ACCOUNTABILITY SYSTEMS FOR 21 ST CENTURY SCHOOLS 


survey results in their charters and performance 
agreements— and the schools agree— states 
could also learn from their experience. 

Finally, if states use qualitative assessments by 
experts, as the English do, assessment teams 
could include parental and student survey data 
in their analyses. If the data looked fishy, they 
could discount it and ask the state, district, or 
authorizer to investigate. 

Assessments of non-cognitive skills. In 

recent years scholars have focused increasing 
attention on character skills not measured 
by standardized tests, such as self-control, 
persistence, and conscientiousness. These 
are also known as non-cognitive skills, social- 
emotional skills, or habits of success. Both 
common sense and academic research suggest 
they are extremely important in determining 
whether a student will succeed in later life. 

In a recent paper, Transforming Education's 
Chris Gabrieli, Dana Ansel, and Sara Bartolino 
Krachman recounted the research findings: 

In the Dunedin Multidisciplinary Health 
& Development Study [in a city in New 
Zealand], 95 percent of the young people in 
the top quintile of self-control were likely to 
graduate from high school, compared with 
58 percent for those in the lowest quintile 
and about 80 percent for those in the next 
two quintiles. In James Heckman's 2006 
analysis of the [U.S.] National Longitudinal 
Survey of Youth from 1979, non-cognitive 
factors were as equally predictive as 
cognitive factors in accounting for which 
young men earned a college degree by age 
30. In the Fast Track longitudinal study, 
kindergartners with high social competency 
were 1.5 times more likely to graduate from 
high school and twice as likely to graduate 


from college. Among the Dunedin Study 
cohort, those from the lowest quintile of 
self-control in their elementary school years 
were more than three times as likely as 
those in the highest quintile of self-control 
to ever have been convicted of a crime (43 
percent versus 13 percent). 50 

The authors summed up the research results 

this way: 

Academics 

1. Non-cognitive skills predict high school and 
college completion. 

2. Students with strong non-cognitive skills have 
greater academic achievement within K-12 
schooling and college. 

3. Fostering non-cognitive skills as early as 
preschool has both immediate and long-term 
impact. 

Career 

1. Employers value non-cognitive skills and seek 
employees who have them. 

2. Higher non-cognitive skills predict a greater 
likelihood of being employed. 

3. Stronger non-cognitive skills in childhood 
predict higher adult earnings and greater 
financial stability. 

Well-Being 

1. Adults with stronger non-cognitive skills 
are less likely to commit a crime and be 
incarcerated. 

2. Strong non-cognitive skills decrease the 
likelihood of being a single or unplanned 
teenage parent. 

3. The positive health effects associated 
with stronger non-cognitive skills include 
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reduced mortality and lower rates of obesity, 
smoking, substance abuse, and mental health 
disorders 51 

Educators have long understood the importance 
of character skills, of course. A 201 3 national 
teacher survey found that 93 percent agreed 
it was important for schools to promote these 
skills, while 88 percent said their schools were 
already trying to do so. 52 Indeed, schools have 
long graded students' conduct, and increasingly 
they create incentives for good behavior and 
penalties for poor behavior. Many charter 
schools focus heavily on helping students 
develop a series of values. Summit Public 
Schools, a charter network in California and 
Washington state, asks students and their 
teachers to fill out surveys on students' habits of 
success twice a year, and mentors then initiate 
discussions about them with their students. 53 

According to Ted Dintersmith and Tony Wagner, 
authors of Most Likely to Succeed, employers 
also recognize how important social-emotional 
skills are: 

Google, for instance, changed its hiring 
strategies after Laszlo Bock, senior 
vice president of people operations, 
analyzed their data and found no 
correlation between job performance 
and an employee's GPA, SATs, or college 
pedigree. Google now considers an 
applicant's ability to collaborate and to 
perform authentic job-related challenges. 

Now, they hire many new employees who 
never went to college. 

Our education goals have lost touch with 
what matters most-helping students 
develop essential skills, competencies, 
and character traits. It's time to reimagine 
the goals for U.S. education, and hold all 


schools-from kindergarten through college 
-accountable for teaching the skills and 
nurturing the dispositions most needed for 
learning, work, and citizenship . 54 

Who could disagree? The problem is that 
almost all measurement of non-cognitive skills 
is done through surveys. (The other option is 
observations by trained personnel, which would 
be guite expensive.) Surveys raise several 
issues, one of which is known as "reference 
bias.” Imagine a demanding school, with a lot 
of homework, and another school that is more 
laissez-faire, with little homework. If we ask 
students and teachers at both schools to rate 
kids on conscientiousness, those at the first 
school will likely have much higher expectations, 
so they will not rate themselves or their students 
as highly as those in the laissez-faire school 
would. Harvard Professor Martin West says 
it well: "To the extent that students attending 
schools with more demanding expectations for 
student behavior hold themselves to a higher 
standard when completing guestionnaires, 
reference bias could make comparisons of their 
responses across schools misleading. If schools 
with high expectations are actually more effective 
in improving students' non-cognitive skills 
(something not yet known but often assumed), 
conclusions about school performance based on 
self-reports could even be precisely backward." 

And what would happen if teachers started 
talking with students a great deal about things 
like self-control, conscientiousness, and 
persistence? Would some kids decide, "I'm just 
low on persistence, and I always will be"? Experts 
who promote the development of non-cognitive 
skills worry about that. According to Gabrieli, 
lowering kids' self-perception on these skills 
leads to lower grades, lower test scores, and 
worse behavior. 
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Finally, if a state made school scores on these 
skills part of the measurement of school quality, 
would teachers start coaching students to 
influence the way they answered surveys? What 
other perverse behavior would emerge? 

We don't know the answers to these questions. 
But we may soon begin to find out, because 
six large districts in California, including 
San Francisco, Los Angeles, and Oakland, 
volunteered to measure non-cognitive skills as 
part of their measurement and rating system, 
and the Obama administration gave them 
waivers to do so. By late 201 6, the experiment 
was entering its third year. These "CORE" 
districts use a School Quality Improvement 
Index that includes test scores, graduation 
rates, suspension and expulsion rates, chronic 
absenteeism, and school culture and climate 
surveys by students, teachers, and parents. 

But it also includes student surveys on four 
habits of success: growth mindset (the belief 
that one's abilities can grow with effort); self- 
efficacy (a belief in one's ability to meet goals); 
self-management (the ability to control one's 
emotions); and social awareness (interpersonal 
skills such as empathy, collaboration, and the 
ability to listen). The results account for eight 
percent of the School Quality Improvement 
Index, though by 201 6 the districts had not 
attached any consequences. According to 
Martin West, who led research efforts examining 
the results, the first year of non-cognitive skills 
data showed the expected correlation between 
social-emotional skills, grade point averages, 
standardized test scores, absenteeism, and 
suspensions, suggesting that the measures 
were fairly accurate. 55 

Clearly, we don't know enough yet to use such 
data as part of accountability systems that 
impose consequences on schools. But these 


skills are critical to success, and the best way 
to learn about measuring something is to start 
doing so. We need to measure student progress 
on social-emotional skills, learn more about their 
role in future success and the relative impact of 
home life vs. school life, develop better ways to 
measure these skills, and figure out how schools 
can improve them. 

In August 201 6 the Collaborative for Academic, 
Social, and Emotional Learning (CASEL) 
announced it will help eight states "create 
and implement plans to encourage social- 
emotional learning in their schools.” 56 But CASEL 
also warns it is too early to begin attaching 
consequences— or even to include the data in 
rating systems. For now, states should include 
non-cognitive skills only in their measurement 
systems. This information would be valuable 
to both schools and parents, when choosing 
schools. If a few states or districts want to 
experiment with including the data in their 
rating and accountability systems, as the 
CORE districts are, they could speed up our 
learning curve. Charter authorizers should 
also be encouraged to negotiate performance 
goals that include such measures with schools 
that are interested in being accountable for 
improving students' non-cognitive skills. But it 
is far too early to force accountability for such 
improvement on schools that don't want it. 

AN IDEAL STATE RATING AND 
ACCOUNTABILITY SYSTEM 

At the risk of repeating myself, state systems 
enforce the minimum levels of performance we 
will accept from our public schools. We need 
other means of accountability— performance 
agreements and charters— to create incentives 
for alternative schools, other schools that differ 
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from the norm, and schools that routinely score 
above the minimum. 

That said, I would suggest that today's statewide 
rating systems— applied to all schools— should 
have five or six basic elements, weighted roughly 
as follows. (The balance between achievement 
and growth should depend on which method 
states use to measure growth; with some value- 
added methods, such as EVAAS, achievement 
and growth can be combined in one value- 
added score.) ESSA reguires that states also 
use English learner progress toward proficiency, 
but I have not specified a recommended weight 
because it should vary by school. In some, with 
many English language learners, it would be 
guite important; in others, with none, it would be 
unimportant. 

For high schools: 

Student academic achievement: 20 percent* 

Student academic growth: 25 percent 

English learners' progress toward proficiency: 

variable 

Student engagement: 1 0 percent 

Qualitative school assessments by experts: 

1 5 percent 

Student outcomes: 25 percent 

Elementary and middle schools would use only 
four elements: 

Student academic achievement: 20 percent* 

Student academic growth: 30 percent 

English learners' progress toward proficiency: 

variable 

Student engagement: 1 0-20 percent 

Qualitative school assessments by experts: 

30 percent 


Indicators for each element could include the 
following: 

Academic achievement and growth: 

Test scores in math, ELA, writing, science, 
and the social sciences 

For English language learners, scores on 
tests designed to measure their progress in 
learning English 

PSAT, SAT ACT, and/or state-approved 
international test scores 

Industry certifications 

Qualitative assessments 

Expert site visit assessments (see pp. 26-27) 

Student engagement 

Parent surveys 

Outcomes (for high schools) 

HS graduation rate: 4 year, 5-7 year, and with 
GED (perhaps 2, 2, and 1 percent for each 
category, respectively) 

Percent of graduates enrolling in college 

Percent of enrollees reguired to take remedial 
classes in college 

* Can be eliminated using certain value-added models for 
measuring growth, in which case all 45 or 50 percent of the 
weight would go to the value-added score. 

Percent of college enrollees persisting to 
second and third years 

Percent of two-year college enrollees 
completing a two-year degree or credential 

Percent of non-college-bound graduates 
employed, in training, or in the military 

Income levels for those employed full-time 
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Measures Recommended for State Accountability Systems, 
Measurement, and Research 


INCLUDE IN A STATEWIDE ACCOUNTABILITY SYSTEM: 


Student academic growth and achievement, measured by; 

Test scores in math, ELA, writing, science, and the 
social sciences 

For English language learners, scores on tests 
designed to measure their progress in learning English 
PSAT, SAT, and/or ACT scores and/or state approved 
international test scores 
• Industry certifications 

Progress toward proficiency of English language 
learners 

Qualitative assessment, measured by: 

Expert site visit assessments 

Student engagement, measured by: 

Parent survey 


For high schools only: Student outcomes, measured by: 

HS graduation rate: 4 year, 5-7 year, and with GED 
Quality of diploma, if states offer different diplomas 
Percent of graduates enrolling in college 
Percent of enrollees required to take remedial classes 
in college 

Percent of college enrollees persisting to second and 
third years 

Percent of two-year college enrollees completing a 
two-year degree or credential 
Percent of non-college bound graduates employed, in 
training, or in the military 
• Income levels for non-college bound graduates 
employed full-time 


REQUIRE DISTRICTS AND AUTHORIZERS TO MEASURE BUT DON'T INCLUDE IN STATEWIDE 
RATING OR ACCOUNTABILITY SYSTEM 


• Student attendance rates 

Rates of chronic student absenteeism 
Rates of teacher absenteeism 

• Student surveys 
Parent surveys 

• Student demand 

■ Student retention 


Teacher retention 
• Safety 
Discipline rates 

Numbers of advanced courses (AP, IB, dual credit, etc.) 
Numbers of student internships 
; Student-teacher ratios 


FUND RESEARCH TO FIND OBJECTIVE, RELIABLE WAYS TO MEASURE: 


Qualitative assessments of student performance 
(performance tasks, portfolios, etc.) 


Student surveys 

Assessments of non-cognitive "character" skills 
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TO GRADE OR NOT TO GRADE? 

Ideally, states should give various weights to 
each indicator and sum them to give a grade 
for each area, plus an overall grade. Some 
states use colors or phrases, such as "meets 
expectations." But parents understand grades on 
an A-F scale better. (I would urge states to use 
pluses and minuses for more precision, just as 
schools do). If we use grades for our children, 
we should have the courage to use them for the 
adults who run our schools. 

Some experts argue against a summative 
grade. California's Superintendent of Public 
Instruction, Tom Torlakson, and the president 
of its State Board of Education, Michael Kirst, 
expressed this view in a recent letter to the 
U.S. Department of Education: "A summative 
rating, in contrast, necessarily glosses over 
differences in performance across indicators 
and inappropriately draws school leaders, 
stakeholders, and the public to focus on the 
single rating rather than a more robust reflection 
of performance demonstrated by the individual 
indicators. We reach this conclusion having over 
1 5 years' experience with a single rating where 
the public paid little attention to the individual 
components that comprised that single rating." 57 

Without a single, summative grade for each 
school, however, accountability becomes 
squishy: schools face much less pressure to 
improve. Chad Aldeman of Bellwether Partners 
cites several examples to make the point: "When 
Wales dropped its school rating system, student 
achievement dropped significantly, particularly 
for lower-performing students. Similarly, after 
New York City dropped its A-F rating system and 
stopped applying pressure on low-performing 
schools, achievement in F-rated schools 
immediately fell." 58 And to say that parents don't 


have the ability to look at multiple indicators and 
the four or five grades that sum to the final grade 
sells them short. For years, we have asked them 
to look at five or six grades on a typical report 
card, not to speak of marks for conduct. Most 
parents can understand where a school is strong 
and where it is weak and make their decisions 
accordingly. 

"Summative ratings are all around us," Aldeman 
points out. 

If you want to go to a movie, you might 
consult a site like IMDb or Rotten 
Tomatoes. Cars, colleges, neighborhoods, 
restaurants, etc. You name it, if there's 
some sort of choice that people can 
make, there's probably at least one, if 
not more than one, rating system to help 
them decide. Even the National Education 
Association, which opposes the idea of 
rating schools, has its own A-F grading 
systems for individual legislators. 

... It's not just that summative ratings exist; 
they're also extremely popular. Consumer 
Reports is an entire magazine devoted to 
rating everyday household products, and it's 
been around since 1 936 for a reason. 

...Summative ratings are simple and 
easy to understand, but they're not one- 
dimensional. All of the rating systems 
mentioned above have various factors that 
go into them (in education-speak, we might 
say they're based on "multiple measures"). 
And, while the overall rating provides 
a useful method for people to make 
decisions, none of these systems stop at a 
numeric rating. They all include much more 
information for people who want to dig in 
further. 


P35 


PP1 


CREATING MEASUREMENT AND ACCOUNTABILITY SYSTEMS FOR 21 ST CENTURY SCHOOLS 


The ESSA requires states to report performance 
data also for subgroups: by gender, race, 
ethnicity, English language learners, low-income 
children, and those receiving special education. 
As the Fordham Institute recommends, states 
should add high-achieving students to this list. 

In all cases, they should average two years of 
data whenever possible, to smooth out annual 
variation and more accurately reflect school 
performance. 

Students who arrive at a school more than six 
weeks into an academic year should not be 
included in these measures. Schools should not 
be held accountable— or even measured— based 
on students they did not have an opportunity to 
educate for at least six months before a test. 

States must ensure the data is audited, analyzed, 
and spot-checked to detect cheating. In the 
past, districts and schools have been caught 
cheating on standardized tests and manipulating 
attendance, graduation, and dropout rates. 


Some districts have labeled their worst 
schools "alternative schools" to avoid negative 
consequences. The lesson: we should be on 
alert for efforts to manipulate all indicators. 

PRESENTING THE DATA TO THE PUBLIC 

States, districts, and authorizers should publish 
brief performance reports on each school, 
showing their scores and grades. These reports 
should include other information that is of value 
to parents and the public but is not included in 
the ratings, such as the number of students, the 
student-teacher ratio, demand for the school, the 
school's mission and focus, the percentage of 
students receiving special education, and the like. 

To put school grades in perspective, states and 
districts could also give schools a percentile 
rating based on their overall score— that is, an 
elementary school that outperformed 62 percent 
of all elementary schools would have a rating of 
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62. (Utah, New York City, and Philadelphia have 
all done this in the past. 59 ) States would also be 
wise to divide schools into groups with similar 
demographics, then provide bar graphs to show 
how all those schools compare. Bar graphs give 
readers visual evidence of how a school stacks 
up against schools with similar students. One 
might be rated at the 62nd percentile but be 
very close in performance to schools at the 90th 
percentile, for instance. Another might be at the 
62nd percentile but be guite close to those at the 
40th percentile, because schools are bunched 
around the middle. Bar graphs reveal this, 
whereas percentiles do not. 

We must remember, however, that individual 
schools should have their own performance 
agreements, in addition to the state standards. 
These would include other goals aligned with 
the school's particular mission and focus. 
Performance reports should give egual space 
to performance against these goals, so parents 
and others can see what the school feels is most 
important and how well it achieves those goals. 
Some schools might be just average when it 
comes to state standards but be outstanding in 
their own focus areas, whether that be music, 
art, drama, debate, STEM, student projects, 
community service, languages, leadership 
development, character development, or real- 
world internships. Performance reports should 
reflect these realities. 

On the other hand, parents have indicated their 
strong preference for brief reports— from two to 
four pages. 60 More in-depth information should 
be provided on websites identified on and linked 
to the reports. 

Finally, districts and authorizers need to give 
families help in understanding the data. They 
should publish explanatory material, as the 


D.C. Public Charter School Board does with 
its Parent Guide to Public Charter School 
Performance. They should also create or contract 
for information centers where parents can get 
help choosing a school for their children, as the 
Recovery School District does in New Orleans. 
When we buy houses, most of us use real estate 
brokers to help us sort through the plethora of 
options and make the best choice. Our decisions 
about our children's education deserve egual 
care, if not more. 

Some schools might be 
just average when it comes 
to state standards but 
be outstanding in their 
own focus areas, whether 
that be music, art, drama, 
debate, STEM, student 
projects, community service, 
languages, leadership 
development, character 
development, or real-world 
internships. Performance 
reports should reflect these 
realities. 


SPECIAL EDUCATION STUDENTS 

Students with severe learning disabilities should 
not be included in these measurement and 
accountability systems, for obvious reasons. 
States should create separate systems for 
them that use different indicators and give less 
weight to academic achievement and growth. 
But most students receiving special education 
services should be included. Students without 
severe disabilities may learn differently from 
others, but they can still learn. Some may need 
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accommodations during tests, such as more 
time. But we don't want to exempt all students 
receiving special education services, because 
that would give schools incentives to label 
students as needing special education. It would 
also be illegal under ESSA. 

IN CONCLUSION 

The kind of system I have described may seem 
like a fantasy to those steeped in the world of 
NCLB, but they already exist. In Massachusetts, 
charter schools must meet minimal state 
standards, but their charters also include 
specific goals . 61 When their charter is up for 
renewal, a team of experts visits for a day and 
a half and writes a gualitative assessment 


report on the school, which the state board uses 
in making its decision. In Washington, D.C., the 
Public Charter School Board uses a performance 
framework much like I have advocated, but 
individual charters include goals specific to the 
schools and reviews include multiple site visits 
to assess the guality of schools. Denver Public 
Schools does much the same thing with its 
charters. In New Orleans, charters must meet 
minimum state standards, and both the RSD and 
OPSB do on-site reviews every year, plus a high- 
stakes review when they are up for renewal. 

All three cities are among the fastest improving 
in the nation, and Massachusetts has one of 
our highest performing charter sectors . 62 Are 
these not the kinds of results we want for all our 
children? 
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